Kunihiro ASADA Makoto IKEDA Satoshi KOMATSU
This paper summarizes power reduction methods applicable for VLSI bus systems in terms of reduction of signal swing, effective capacitance reduction and reduction of signal transition, which have been studied in authors' research group. In each method the basic concept is reviewed quickly along with some examples of its application. A future perspective is also described in conclusion.
Hiroshi TAKAHASHI Shintaro MIZUSHIMA
High-speed and low-power DSPs have been developed for versatile applications, especially for digital communications. These DSPs contain a 16-bit fixed point DSP core with multiple buses, highly tuned instruction set and low-power architecture, featuring 0.45 mA/MIPS, 100-120 MIPS performance by a single CPU core, 200 MIPS performance by dual CPU core architecture, respectively and also contain a 1.2 V low-voltage DSP core with 30 MIPS performance for super low-power applications. In this paper, new architecture VIA2 programming ROM for high-speed and new D flip-flop circuit considering the impact of pocket implantation process for low power are discussed, including key C-MOS process technology.
Koji INOUE Tohru ISHIHARA Kazuaki MURAKAMI
This paper proposes a new approach to achieving high performance and low energy consumption for set-associative caches. The cache, called way-predicting set-associative cache, speculatively selects a single way, which is likely to contain the data desired by the processor, from the set designated by a memory address, before it starts a normal cache access. By accessing only the single way predicted, instead of accessing all the ways in a set, energy consumption can be reduced. In order for the way-predicting cache to perform well, accuracy of way prediction is important. This paper shows that the accuracy of an MRU (most recently used)-based way prediction is higher than 90% for most of the benchmark programs. The proposed way-predicting cache improves the ED (energy-delay) product by 60-70% compared to the conventional set-associative cache.
Kenichi OSADA Hisayuki HIGUCHI Koichiro ISHIBASHI Naotaka HASHIMOTO Kenji SHIOZAWA
We fabricated a 16-kB cache macro using 0.35-µm quadruple-metal CMOS technology. This is a 285-MHz, two-port 16-kB (512256 b) cache macro that has a 2-ns access time. This high-speed performance is enabled by a hierarchical bit-line architecture that uses double global bit-line pairs (WGBs), and a high-speed timing-insensitive sense amplifier (ISA) that shortens the access time.
Young-Su KWON In-Cheol PARK Chong-Min KYUNG
A new flip-flop configuration for half-swing clocking is proposed to save total clocking power. In the proposed scheme, only NMOS's are clocked with the half-swing clock in order to make it operate without level converters or any additional logics which were used in the earlier half-swing clocking schemes. Vcc is supplied to the random logic circuits and flip-flops while Vcc/2 is supplied to the clock network and some parts of the flip-flop to reduce the power consumed in the clock network. Compared to the conventional scheme, the proposed flip-flop configuration can save the clocking power by 40%.
Nobuhide YOSHIDA Masahiro FUJII Takao ATSUMO Keiichi NUMATA Shuji ASAI Michihisa KOHNO Hirokazu OIKAWA Hiroaki TSUTSUI Tadashi MAEDA
An emitter coupled logic (ECL) compatible low-power GaAs 8:1 multiplexer (MUX) and 1:8 demultiplexer (DEMUX) for 10-Gb/s optical communication systems has been developed. In order to decrease the power consumption and to maximize the timing margin, we estimated the power consumption for direct-coupled FET logic (DCFL) and source-coupled FET logic (SCFL) circuits in terms of the D-type flip-flop (D-FF) operating speed and the duty-ratio variation. Based on the result, we used SCFL circuits in the clock-generating circuit and the circuits operating at 10 Gb/s, and we used DCFL circuits in the circuits operating below 5 Gb/s. These ICs, which are mounted on ceramic packages, operate at up to 10 Gb/s with power consumption of 1.2 W for the 8:1 MUX and 1.0 W for the 1:8 DEMUX. This is the lowest power consumption yet reported for 10-Gb/s 8:1 MUX and 1:8 DEMUX.
Won-Hyo LEE Sung-Dae LEE Jun-Dong CHO
In this paper, we introduce a high-speed and low-power Phase-Frequency Detector (PFD) that is designed using a modified TSPC (True Single-Phase Clock) positive edge triggered D flip-flop . The proposed PFD has a simple structure with using only 19 transistors. The operation range of this PFD is over 1.4 GHz without using additional prescaler circuits. Furthermore, the PFD has a dead zone less than 0.01ns in the phase characteristics and has low phase sensitivity errors. The phase and frequency error detection range is not limited as in the case of the pt-type and nc-type PFDs. Also, the PFD is independent of the duty cycle of input signals. Also, a new charge-pump circuit is presented that is based on a charge-amplifier. A stand-by current of the proposed charge-pump circuit enhances the speed of charge-pump and removes the charge sharing which causes a phase noise in the charge pump PLL. Furthermore, the effect of clock feedthrough is reduced by separating the output stage from up and down signal. The simulation results base on a third order PLL are presented to verify the lock in process with the proposed PFD and charge pump circuits. The proposed PFD and charge-pump circuits are designed using 0.8 µm CMOS technology with 5 V supply voltage.
In this paper, a novel application specific power optimization technique utilizing small instruction ROM which is placed between an instruction cache or a main program memory and CPU core is proposed. Our optimization technique targets embedded systems which assume the following: (i) instruction memories are organized by two on-chip memories, a main program memory and a subprogram memory, (ii) these two memories can be independently powered-up or powered-down by a special instruction of a core processor, and (iii) a compiler optimizes an allocation of object code into these two memories so as to minimize average of read energy consumption. In many application programs, only a few basic blocks are frequently executed. Therefore, allocating these frequently executed basic blocks into low power subprogram memory leads significant energy reduction. Our experiments with actual ROM (Read Only Memory) modules created with 0.5 µm CMOS process technology, and MPEG2 codec program demonstrate significant energy reductions up to more than 50% at best case over the previous approach that applies only divided bit and word lines structure.
Bao-Yu SONG Makoto FURUIE Yukihiro YOSHIDA Takao ONOYE Isao SHIRAKAWA
An NMOS 4-phase dynamic logic scheme is described, which is intended to achieve low-power consumption in the deep submicron design. In this scheme, the short-circuit current is eliminated, and moreover, the voltage swing of transition signals is reduced, resulting in enhancing power reduction effectively. First, distinctive features of this 4-phase dynamic logic are specified, as compared with the static CMOS logic and dynamic domino CMOS logic. Then, power simulations are attempted for the 4-phase dynamic logic, static CMOS logic, dynamic CMOS logic, and pass-transistor logic, by using a number of logic modules, which demonstrate that the NMOS 4-phase dynamic logic is the most power-efficient. Moreover, through the gate delay simulation, the capability of how many transistors can be packed in a logic block is also discussed.
Jin-Cheon KIM Sang-Hoon LEE Hong-June PARK
A half-swing clocking scheme with a complementary gate and source drive is proposed for a CMOS flip-flop to reduce the power consumption of the clock system by 43%, while keeping the flip-flop delay time the same as that of the conventional full-swing clocking scheme. The delay time of the preceding half stage of a flip-flop using this scheme is less than half of that using the previous half-swing clocking scheme.
Shoji KAWAHITO Junichi NAKA Yoshiaki TADOKORO
This paper presents a low-power video A/D conversion technique using features of moving pictures. Neighboring frames in typical video sequences and neighboring pixels in each video frame are highly correlated. This property is effectively used for the video A/D conversion to reduce the number of comparators and the resulting power consumption. A set of reference voltages is given to a comparator array so that the iterative A/D conversion converges in the logarithmic order of the prediction error. Simulation results using standard moving pictures showed that the average number of iterations for the A/D conversion is less than 3 for all the moving pictures tested. In the proposed 12 b A/D converter, the number of comparators can be reduced to about 1/5 compared with that of the two-step flash A/D converters, which are commonly used for video applications. The A/D converter is particularly useful for the integration to CMOS image sensors.
Dwi HANDOKO Shoji KAWAHITO Yoshiaki TADOKORO Akira MATSUZAWA
This paper presents a novel method of an on-sensor motion vector estimation. One of the key techniques is an iterative block matching algorithm using high-speed interpolated pictures. This technique allows us to estimate the video-rate (30 frame/s) motion vectors accurately from the motion vectors obtained at high-speed frames. The proposed iterative block matching reduces the computational complexity by a factor of more than one tenth compared to the conventional full search block matching algorithm. This property is particularly useful for the reduction of the power dissipation of video encoder. Another proposed technique is a high-speed non-destructive image sensing. This technique is essential to obtain high-speed interpolated pictures while maintaining high image quality in video-rate image sensing. The estimated power dissipation of the designed CMOS image sensor is sufficiently low, allowing us to achieve a totally low-power design of one-chip CMOS cameras integrating an image sensor and a video encoder.
Ri-A JU Dong-Ho LEE Sang-Dae YU
This paper describes a 10-bit 40-MS/s pipelined A/D converter implemented in a 0.8-µm double-poly, double-metal CMOS process. This A/D converter achieves low power dissipation of 36-mW at 5-V power supply. A 1.5-bit/stage pipelined architecture allows large correction range for comparator offset, and performs fast interstage signal processing. For high speed and low power operation, the sample-and-hold amplifier is designed using op-amp sharing technique and dynamic comparator. In addition, fully-differential folded-cascode op amp with gain-boosting stage is designed by an automatic design tool. When 10-MHz input signal is applied, SNDR is 55.0 dB, and SNR is 56.7 dB. The DNL and INL exhibit 0.6 LSB, +1/-0.75 LSB respectively.
Koichi TANNO Okihiko ISHIZUKA Zheng TANG
In this paper, a four-quadrant analog multiplier consisting of four neuron-MOS transistors and two load resistors is proposed. The proposed multiplier can be operated at only 1 V. Furthermore, the input range of the multiplier is equal to 100% of the supply voltage. The theoretical harmonic distortion caused by mobility degradation and device mismatchs is derived in detail. The performance of the proposed multiplier is characterized through HSPICE simulations with a standard 2.0 µm CMOS process with a double-poly layer. Simulations of the proposed multiplier demonstrate that the linearity error of 0.77% and a total harmonic distortion of 0.62% are obtained with full-scale input conditions. The maximum power consumption and 3 dB bandwidth are 9.56 µW and 107 MHz, respectively. The active area of the proposed multiplier is 210 µm 140 µm.
Hisayasu SATO Nagisa SASAKI Takahiro MIKI
This paper describes a flip-flop circuit using a directly controlled emitter-follower with a diode-feedback level stabilizer (DC-DF) and a resistor-feedback level stabilizer (DC-RF) for low-power multi-GHz prescalers. The new flip-flop circuit reduces the emitter-follower current and gains both high-frequency operation and low-power. A dual modulus (4/5) prescaler using this circuit technology was fabricated with a 0.35 µm BiCMOS process. The current draw of the prescaler using the DC-RF is 34% smaller than conventional LCML circuits. The DC-RF prescaler operates at 2.11 GHz with a total current consumption of 1.03 mA. In addition, the circuit operates with a supply voltage of down to 2.4 V by using the resistor level-shift clock-driver.
Fukashi MORISHITA Yasuo YAMAGUCHI Takahisa EIMORI Toshiyuki OASHI Kazutami ARIMOTO Yasuo INOUE Tadashi NISHIMURA Michihiro YAMADA
It is confirmed by simulation that SOI-DRAMs can be operated at high speed by using the floating body structures. Several floating body effects are analyzed. It is described that the dynamic retention characteristics are not dominated by capacitive coupling and hole redistribution. And it is described that those characteristics are determined by the leakage current in the small pn-junction region of the floating body. Reducing pn junction leakage current is important for realizing a long retention time. If the pn junction leakage is suppressed to 10-18 A/µm, a dynamic retention time of 520 sec at a VBSG of 0.5 V can be achieved at 27. On those conditions, the refresh current of SOI-DRAMs is reduced by 54% compared with bulk-Si DRAMs.
Tetsu TANAKA Youichi MOMIYAMA Toshihiro SUGII
Dynamic Threshold-Voltage MOSFETs (DTMOS) in which the body is connected to the gate provide extremely high transconductance for supply voltages as low as under 0.7 V. This is because the forward body-source bias lowers the threshold voltage, which results in large gate drive and large drain current. This paper describes the high frequency characteristics of DTMOS for the first time. The DTMOS we analyzed has a small parasitic resistance due to employing optimized Co salicide technology. It also has a small parasitic capacitance due to a reduction in the overlapping region between the gate and drain, which is achieved by employing gate poly-Si oxidation prior to LDD implantation. We obtained an Ft of 78 GHz and an Fmax of 37 GHz for a 0. 1-µm-Leff DTMOS even at a supply voltage of 0.7 V. We also observed an Fmax enhancement by 1.5 times for a 0.12-µm-Leff DTMOS compared to a conventional SOI MOSFET, which we attributed to high transconductance and large output resistance. The DTMOS can be considered as the most promising device for low-power RF LSIs.
Daisuke MIYAZAKI Shoji KAWAHITO Yoshiaki TADOKORO
This paper presents a new scheme of a low-power area-efficient pipelined A/D converter using a single-ended amplifier. The proposed multiply-by-two single-ended amplifier using switched capacitor circuits has smaller DC bias current compared to the conventional fully-differential scheme, and has a small capacitor mismatch sensitivity, allowing us to use a smaller capacitance. The simple high-gain dynamic-biased regulated cascode amplifier also has an excellent switching response. These properties lead to the low-power area-efficient design of high-speed A/D converters. The estimated power dissipation of the 10-b pipelined A/D converter is less than 12 mW at 20 MSample/s.
In this paper, we propose a method, called PORT-D, for optimizing CMOS logic circuits to reduce the average power dissipation. PORT-D is an extensional method of PORT. While PORT reduces the average power dissipation under the zero delay model, PORT-D reduces the average power dissipation by taking into account of the gate delay. In PORT-D, the average power dissipation is estimated by the revised BDD traversal method. The revised BDD traversal method calculates switching activity of gate output by constructing OBDD's without representing switching condition of a gate output. PORT-D modifies the circuit in order to reduce the average power dissipation, where transformations which reduce the average power dissipation are found by using permissible functions. Experimental results for benchmark circuits show PORT-D reduces the average power dissipation more than the number of transistors. Furthermore, we modify PORT-D to have high power reduction capability. In the revised method, named PORT-MIX, a mixture strategy of PORT and PORT-D is implemented. Experimental results show PORT-MIX has higher power reduction capability and higher area optimization capability than PORT-D.
Nobutaro SHIBATA Hiroshi INOKAWA Keiichiro TOKUNAGA Soichi OHTA
High-speed and low-power techniques are described for megabit-class size-configurable CMOS SRAM macrocells. To shorten the design turn-around-time, the methodology of abutting nine kinds of leaf cells is employed; two-level via-hole programming and the array-address decoder embedded in each control leaf cell present a divided-memory-array structure. A new squashed-memory-cell architecture using trench isolation and stacked-via-holes is proposed to reduce access times and power dissipation. To shorten the time for writing data, per-bitline architecture is proposed, in which every bitline has a personal writing driver. Also, read-out circuitry using a current-sense-type two-stage sense amplifier is designed. The effect of the non-multiplexed bitline scheme for fast read-out is shown in a simulation result. To reduce the noise from the second- to first-stage amplifier due to a feedback loop, current paths are separated so as not to cause common impedance. To confirm the techniques described in this paper, a 1-Mb SRAM test chip was fabricated with an advanced 0.35-µm CMOS/bulk process. The SRAM has demonstrated 250-MHz operation with a 2.5-V typical power supply. Also, 100-mW power dissipation was obtained at a practical operating frequency of 150-MHz.